A Web-Based Chatbot with Advanced Voice Control and Document Interaction Using Flask and JavaScript

Authors: Ashwini Maidamshetti, Ravikanth K

DOI Link: https://doi.org/10.22214/ijraset.2025.73852

Abstract

Chatbots in the last few years have been the subject of growing interest as smart conversational agents for human–computer interaction in various applications. Most current systems, however, have limitations like insufficient flexible voice control, limited document support, and minimal user-friendly capabilities. To overcome these drawbacks, this paper introduces a web-based chatbot system designed using Flask as the backend and JavaScript as the frontend. The system incorporates cutting-edge features like dark mode for user ease, text-to-speech with voice toggle control, selective text reading, microphone-based speech input, and document upload (PDF, DOCX, TXT) for content-based question answering. The backend utilizes the OpenRouter API to provide accurate responses, while the frontend offers an interactive and accessible interface. Experimental evaluation and user feedback emphasize that the suggested system provides greater usability, flexibility, and engagement over traditional chatbots. This research shows the possibility of integrating multi-modal capabilities with lightweight web technologies to enhance user experience in conversational AI systems.

Introduction

Chatbots are AI-powered tools designed to simulate human conversation, commonly used in fields like education, healthcare, customer service, and entertainment. While popular, many existing chatbot systems are limited to text-only interactions, lacking voice control, document handling, and personalization features.

To overcome these issues, the proposed chatbot system introduces a web-based, multi-functional chatbot using Flask (Python) for the backend and JavaScript for the frontend, offering enhanced accessibility, usability, and interactivity.

2. Key Features of the Proposed Chatbot

???? Voice Toggle: Users can turn Text-to-Speech (TTS) on/off.
???? Speech Input: Use microphone for hands-free interaction.
???? Document Upload: Supports PDF, DOCX, and TXT for file-based Q&A.
???? Dark Mode: Enhances user comfort.
????? Selective Text Reading: Only selected chatbot replies are read aloud.
???? Chat History: Option to save or clear conversation.

3. Key Contributions

? Integration of voice, text, and file-based interactions in one platform.
? Focus on user-friendly UI with accessibility features.
? Implementation of document-aware Q&A.
? Comparison showing better usability than traditional chatbot systems.

4. Literature Review – Gaps in Existing Systems

Study	Focus	Limitation
Rule-Based Bots	Predefined answers (Sharma & Gupta)	No flexibility or contextual understanding
Voice-Enabled Bots	Speech accessibility (Mulyadi et al.)	No voice toggle or personalization
Document-Aware Bots	NLP on file content (Chowdhury & Johnson)	High computational cost, poor web integration
Flask-Based Bots	Simple deployment (Kumar)	No advanced interaction or TTS features
Multimodal Bots	Voice + text (Lee)	Lacked selective reading control

Insight: While progress exists, no single system combines text, voice, file handling, and personalization in a lightweight, real-time web application.

5. Methodology

A. Existing Chatbot Types

Rule-Based: Good for static content but inflexible.
Voice-Based: Helps with accessibility but lacks control.
Document-Aware: Smart responses from files but resource-heavy and not web-optimized.

B. Proposed Approach

A unified, lightweight web chatbot offering:

Real-time text/voice communication
Document parsing and intelligent response generation
Personalized UI with toggle options and dark mode

6. System Architecture

Three-Tier Architecture:

Frontend (HTML/CSS/JS): UI with chat interface, voice buttons, dark mode, file upload
Backend (Flask, Python): Manages queries, file parsing, API calls, and chat logic
Intelligence Layer (OpenRouter API): NLP model generates smart, context-aware replies

7. Implementation Details

A. Frontend

Built with JavaScript
Features: TTS toggle, STT input, file upload, dark mode
Uses fetch() to connect to Flask backend

B. Backend

Built with Flask
Routes:
- /generate: Handles queries and gets responses from OpenRouter API
- /upload: Extracts text from uploaded documents (PDF/DOCX/TXT)
Uses pypdf, python-docx for document parsing

C. Data Handling

Lightweight prototype uses in-memory storage for session-based interactions
No database used, but scalable to integrate MongoDB/PostgreSQL

8. Advantages

???? Combines text, speech, and document interaction
? Lightweight and fast (Flask + JS)
???? Secure and customizable
???? Extensible for academic, business, or assistive tech use cases

Conclusion

In this paper, we introduced a Web-Based Chatbot system developed with a lightweight architecture using Flask (Python) for the backend and JavaScript for the frontend. The chatbot supports advanced features like text-to-speech with toggle control, selective reading of text, speech input via microphone, document upload and Q&A (PDF, DOCX, TXT), dark mode, and chat management. Experimental testing and user opinions affirm that the system offers a more engaging, accessible, and user-centered experience than current chatbots that support text-only, voice-only, or document-only interactions. The proposed system illustrates that multi-modal interaction is possible in a lightweight and scalable platform, which makes it viable for real-time deployment in education, business, and customer service. But the existing implementation has some restrictions: 1) Reliance on third-party APIs (OpenRouter) involves some latency. 2) Multiple language support is not available, limiting usage to English-speaking markets. 3) No persistent database integration for long-term chat storage.

References

[1] S. Jain and R. Gupta, “AI-powered Chatbots for Customer Support: A Review,” International Journal of Computer Applications, vol. 176, no. 25, pp. 10–15, 2020. [2] M. Mulyadi and F. Nurprihatin, “Voice-enabled Chatbot for Healthcare Applications Using NLP,” Journal of Theoretical and Applied Information Technology, vol. 99, no. 12, pp. 2678–2689, 2021. [3] A. Chowdhury and S. Das, “Chatbot Implementations in Higher Education: A Comprehensive Review,” Education and Information Technologies, Springer, vol. 27, pp. 7853–7871, 2022. [4] P. Johnson, “Document-Aware Conversational Agents: Advances and Challenges,” IEEE Access, vol. 10, pp. 45612–45625, 2022. [5] R. Kumar and A. Sharma, “Flask-based Academic Chatbot for Student Support,” International Journal of Innovative Research in Computer and Communication Engineering (IJIRCCE), vol. 10, no. 4, pp. 1405–1411, 2022. [6] J. Lee, “Multimodal Chatbots: Integrating Voice and Text for User Engagement,” ACM Transactions on Interactive Intelligent Systems (TiiS), vol. 12, no. 3, pp. 1–20, 2023. [7] Mozilla Developer Network (MDN), “Web Speech API Documentation,” [Online]. Available: https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API, Accessed: Feb. 2025. [8] OpenRouter, “OpenRouter API Documentation,” [Online]. Available: https://openrouter.ai/docs, Accessed: Feb. 2025. [9] Python Software Foundation, “Flask Documentation,” [Online]. Available: https://flask.palletsprojects.com, Accessed: Feb. 2025. [10] S. Patel, “Text-to-Speech Synthesis for Web-based Chat Applications,” International Journal of Emerging Trends in Engineering Research (IJETER), vol. 11, no. 2, pp. 55–62, 2023.

Copyright

Copyright © 2025 Ashwini Maidamshetti, Ravikanth K. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET73852

Publish Date : 2025-08-26

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here